library(tidyverse)
library(knitr)
library(sjPlot)
library(lme4)
library(emmeans)
library(car) # for vif
library(bbmle) # for AICtab
library(broom) # for glance
theme_set(ggthemes::theme_few())

Summary

Mixed modeling with all relevant variables predicting accuracy

From the preregistration, the mixed model was specified thusly:

correct ~ delay * age + 
          task_experience + cup_distance + board_size + trial +
          (1 + delay + trial | site/subject/block/hiding_location ) + 
          (1 + task_experience + cup_distance + board_size + trial + delay | species)

In the dataframe, subject_site = subject, and norm_age should be used for age.

Model as pre-registered has too many random effects

Error: number of observations (=6246) < number of random effects (=10608) for term (1 + delay + trial | hiding_location:(block:(subject_site:site))); the random-effects parameters are probably unidentifiable

Pruning random effects in the following order (from preregistration):

  • Remove correlations between random effects
  • Remove random slopes (in the following order)
    • species
    • hiding_location
    • block
    • subject

Model only converges once we take out hiding_location. After doing so, the other random effects (correlation, site, species) can be put back in.

The model below converges. Model output is saved in 06_mp_model_v2.rds

correct ~ delay * norm_age + 
          task_experience + cup_distance + board_size + trial + 
          (1 + delay + trial | site/subject_site/block) + 
          (1 + task_experience + cup_distance + board_size + trial + delay | species)

Reduced model

After pruning random effects with little variability and removing board_size, which covaried with cup_distance, the reduced model has the following structure. It is saved in 06_mp_3_model3_v2.rds

correct ~ delay * norm_age + 
          task_experience + cup_distance + trial + 
          (1 + delay | site/subject_site) + 
          (1 + delay | species)

Data prep

Data import

mp_data <- read.csv("../data/merged_data/01_manyprimates_pilot_merged_data_v2.csv")

Prepare code for pre-registered mixed modeling

  • center cup_distance, board_size and trial
  • filter out spider monkey. Only one data point so far, therefore this is not worth including to explode the number of random effects
model.data <- mp_data %>%
  filter(species != "black_faced_spider_monkey") %>%
  mutate_at(vars(cup_distance, board_size, trial), funs(scale(.)[, 1])) %>%
  mutate(hiding_location = factor(hiding_location),
         delay = fct_relevel(delay, "short"))

Model 1

The model takes a while to run. Run next line to load model output from previous run with structure below.

# mm.1 <- readRDS("06_mp_model.rds")
mm.1 <- readRDS("06_mp_model_v2.rds")
mm.1 <- glmer(correct ~ delay * norm_age +
               task_experience + cup_distance + board_size + trial +
               (1 + delay + trial | site/subject_site/block) +
               (1 + task_experience + cup_distance + board_size + trial + delay | species)
             , data = model.data
             , family = binomial
             , control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5))
             )

saveRDS(mm.1, "06_mp_model_v2.rds")

Some diagnostics

  • examining Cholesky decomposition
theta <- getME(mm.1, "theta")
diag.element <- getME(mm.1, "lower") == 0
any(theta[diag.element] < 1e-5)
[1] TRUE

Model summary

Confirm model structure

# mm.1@call
formula(mm.1)
correct ~ delay * norm_age + task_experience + cup_distance + 
    board_size + trial + (1 + delay + trial | site/subject_site/block) + 
    (1 + task_experience + cup_distance + board_size + trial + 
        delay | species)
glance(mm.1) %>% kable(digits = 2)
sigma logLik AIC BIC deviance df.residual
1 -3385.22 6906.44 7364.74 6364.34 6178

Random effects

fmt <- function(num, digits) return(round(num, digits))
VarCorr(mm.1) %>% print(formatter = fmt, digits = 3) # comp = c("Variance", "Std.Dev.")
 Groups                    Name               Std.Dev. Corr                               
 block:(subject_site:site) (Intercept)        0                                           
                           delaylong          0         NaN                               
                           delaymedium        0         NaN  0.95                         
                           trial              0         NaN  0.76  0.91                   
 subject_site:site         (Intercept)        0.861                                       
                           delaylong          0.653    -0.91                              
                           delaymedium        0.533    -0.85  0.99                        
                           trial              0.093    -0.25  0.63  0.72                  
 site                      (Intercept)        0.849                                       
                           delaylong          0.54     -1.00                              
                           delaymedium        0.652    -0.99  1.00                        
                           trial              0.092     0.86 -0.81 -0.81                  
 species                   (Intercept)        0.532                                       
                           task_experienceyes 0.066     1.00                              
                           cup_distance       0.025    -1.00 -1.00                        
                           board_size         0.112    -1.00 -1.00  1.00                  
                           trial              0.005    -1.00 -1.00  1.00  1.00            
                           delaylong          0.449    -1.00 -1.00  1.00  1.00  1.00      
                           delaymedium        0.388    -1.00 -1.00  1.00  1.00  1.00  1.00

Fixed effects

CIs

mm.1.ci <- confint(mm.1, method = "Wald") %>% # bootstrap these later
  as.data.frame %>%
  rownames_to_column %>%
  filter(complete.cases(.)) %>%
  rename(LL = `2.5 %`, UL = `97.5 %`) %>%
  mutate(OR_LL = exp(LL), OR_UL = exp(UL))
coef(summary(mm.1)) %>%
  as.data.frame %>%
  rownames_to_column() %>%
  mutate(OR = exp(Estimate)) %>%
  left_join(mm.1.ci, by = "rowname") %>%
  select(rowname, OR, OR_LL, OR_UL, Estimate, LL, UL, everything()) %>%
  kable(digits = 3)
rowname OR OR_LL OR_UL Estimate LL UL Std. Error z value Pr(>|z|)
(Intercept) 5.243 2.440 11.266 1.657 0.892 2.422 0.390 4.245 0.000
delaylong 0.262 0.150 0.457 -1.340 -1.897 -0.783 0.284 -4.716 0.000
delaymedium 0.340 0.190 0.608 -1.080 -1.663 -0.497 0.297 -3.632 0.000
norm_age 1.021 0.824 1.265 0.021 -0.194 0.235 0.109 0.188 0.851
task_experienceyes 1.019 0.714 1.454 0.019 -0.337 0.375 0.181 0.105 0.917
cup_distance 1.997 1.550 2.573 0.692 0.438 0.945 0.129 5.352 0.000
board_size 1.278 0.983 1.662 0.245 -0.017 0.508 0.134 1.831 0.067
trial 1.029 0.922 1.148 0.029 -0.081 0.138 0.056 0.510 0.610
delaylong:norm_age 1.015 0.820 1.256 0.015 -0.198 0.228 0.109 0.136 0.892
delaymedium:norm_age 1.058 0.859 1.304 0.056 -0.152 0.265 0.106 0.530 0.596
corr <- cov2cor(vcov(mm.1)) %>% as.matrix %>% round(2)
corr[upper.tri(corr, diag = T)] <- ""
colnames(corr) <- 1:10
rownames(corr) <- str_c(1:10, " ", rownames(corr))

corr %>% as.data.frame %>% select(-10) %>% rownames_to_column

Pairwise contrasts for delay

based on estimated marginal means

Note. This wasn’t in the preregistration.

emmeans(mm.1, pairwise ~ delay, type = "response")$contrasts
 contrast       odds.ratio         SE  df z.ratio p.value
 short / long    3.8183950 1.08483113 Inf   4.716  <.0001
 short / medium  2.9435164 0.87514908 Inf   3.631  0.0008
 long / medium   0.7708779 0.07336213 Inf  -2.734  0.0172

Results are averaged over the levels of: task_experience 
P value adjustment: tukey method for comparing a family of 3 estimates 
Tests are performed on the log odds ratio scale 

Model 1 plots

Fixed effects

plot_model(mm.1, title = "Fixed Effects", order.terms = c(7, 4, 3:1, 9:8, 5, 6),
           width = .3, show.values = T, value.size = 2.5, value.offset = .3) +
  geom_hline(yintercept = 1, lty = 2) +
  ylim(0, 3)

Random effects

ranef.plots <- plot_model(mm.1, type = "re", sort.est = "(Intercept)")

Block/Subject/Site

In line with the model summary above, there’s essentially zero variability in the random effects estimates for this.

ranef.plots[[1]]

Subject/Site

ranef.plots[[2]]

Site

ranef.plots[[3]]

Species

ranef.plots[[4]]


Pruning the model

  • remove block from random effects as the estimates in the previous models were essentially 0
  • same for trial random slopes within species
correct ~ delay * norm_age + 
          task_experience + cup_distance + board_size + trial +
          (1 + delay + trial | site/subject_site ) +         
          (1 + task_experience + cup_distance + board_size + delay | species)

Check colinearity in the previous model

col.mm1 <- glm(correct ~ delay + norm_age +
                 task_experience + cup_distance + board_size + trial
               , data = model.data
               , family = binomial)
vif(col.mm1)
                    GVIF Df GVIF^(1/(2*Df))
delay           1.008742  2        1.002178
norm_age        1.111308  1        1.054186
task_experience 1.048604  1        1.024013
cup_distance    2.068924  1        1.438375
board_size      1.945906  1        1.394957
trial           1.000394  1        1.000197

board_size and cup_distance show high colinearity

Remove board_size as it is highly correlated with cup_distance. Cup distance seems to be of more immediate relevance.

correct ~ delay * norm_age + 
          task_experience + cup_distance + trial +
          (1 + delay + trial | site/subject_site ) +         
          (1 + task_experience + cup_distance + delay | species)

Check levels of random effects

Check how many different levels there are within each random effect

source("diagnostic_fcns.r")
overview <- fe.re.tab("correct ~ delay + task_experience + cup_distance + trial", "species", data = model.data)
overview$summary
$`delay_within_species (factor)`

 3 
11 

$`task_experience_within_species (factor)`

1 2 
9 2 

$`cup_distance_within_species (covariate)`

1 2 4 
7 3 1 

$`trial_within_species (covariate)`

36 
11 

This suggests that, within species, random slopes for task_experience does not make much sense as most species have only 1 level. Same is true for cup_distance. Indeed, the model summary and random effects plot for species confirm that there is little variability in these estimates (they’re close to zero). Therefore they are removed.

correct ~ delay * norm_age + 
          task_experience + cup_distance + trial +
          (1 + delay + trial | site/subject_site ) +         
          (1 + delay | species)

Model 2

The model takes a while to run. Run next line to load model output from previous run with structure below.

mm.2 <- readRDS("06_2_mp_model2_v2.rds")
# mm.2.ci <- readRDS("06_2_mp_model2_ci_v2.rds")
mm.2 <- glmer(correct ~ delay * norm_age +
              task_experience + cup_distance + trial +
              (1 + trial + delay | site/subject_site) +
              (1 + delay | species)
              , data = model.data
              , family = binomial
              , control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5))
              )

saveRDS(mm.2, "06_2_mp_model2_v2.rds")

Model 3

Remove trial from the random slopes for subject/site as it’s near zero both in mm.1 and even more so in mm.2

VarCorr(mm.2) %>% print(comp = c("Variance", "Std.Dev."), formatter = fmt, digits = 3)
 Groups            Name        Variance Std.Dev. Corr             
 subject_site:site (Intercept) 0.756    0.869                     
                   trial       0.008    0.092    -0.25            
                   delaylong   0.43     0.656    -0.91  0.63      
                   delaymedium 0.288    0.536    -0.84  0.74  0.99
 site              (Intercept) 0.936    0.968                     
                   trial       0.008    0.09      0.86            
                   delaylong   0.4      0.632    -1.00 -0.81      
                   delaymedium 0.499    0.707    -0.99 -0.79  1.00
 species           (Intercept) 0.353    0.594                     
                   delaylong   0.19     0.436    -1.00            
                   delaymedium 0.132    0.363    -1.00  1.00      
plot_model(mm.2, type = "re", sort.est = "(Intercept)")[[1]]

plot_model(mm.2, type = "re", sort.est = "(Intercept)")[[2]]

The model takes a while to run. Run next line to load model output from previous run with structure below.

mm.3 <- readRDS("06_3_mp_model3_v2.rds")
mm.3 <- glmer(correct ~ delay * norm_age +
              task_experience + cup_distance + trial +
              (1 + delay | site/subject_site) +
              (1 + delay | species)
              , data = model.data
              , family = binomial
              , control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5))
              )

saveRDS(mm.3, "06_3_mp_model3_v2.rds")

Model summary

Confirm model structure

formula(mm.3)
correct ~ delay * norm_age + task_experience + cup_distance + 
    trial + (1 + delay | site/subject_site) + (1 + delay | species)
glance(mm.3) %>% kable(digits = 2)
sigma logLik AIC BIC deviance df.residual
1 -3391.12 6836.24 7018.21 6386.6 6219

LRT

drop1(mm.3, test = 'Chisq')
Single term deletions

Model:
correct ~ delay * norm_age + task_experience + cup_distance + 
    trial + (1 + delay | site/subject_site) + (1 + delay | species)
                Df    AIC     LRT  Pr(Chi)    
<none>             6836.2                     
task_experience  1 6834.4  0.1483   0.7001    
cup_distance     1 6854.5 20.3087 6.59e-06 ***
trial            1 6835.0  0.7541   0.3852    
delay:norm_age   2 6832.6  0.3836   0.8255    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Random effects

VarCorr(mm.3) %>% print(comp = c("Variance", "Std.Dev."), formatter = fmt, digits = 3)
 Groups            Name        Variance Std.Dev. Corr       
 subject_site:site (Intercept) 0.752    0.867               
                   delaylong   0.427    0.654    -0.90      
                   delaymedium 0.27     0.519    -0.85  0.99
 site              (Intercept) 1.018    1.009               
                   delaylong   0.472    0.687    -1.00      
                   delaymedium 0.547    0.74     -1.00  1.00
 species           (Intercept) 0.345    0.587               
                   delaylong   0.19     0.436    -1.00      
                   delaymedium 0.134    0.367    -1.00  1.00

Fixed effects

CIs

# this is not currently run
source("boot_glmm.r")

mm.3.ci <- boot.glmm.pred(model.res = mm.3, excl.warnings = F, nboots = 1000,
                         para = F, resol = 100, level = 0.95, use = NULL,
                         circ.var.name = NULL, circ.var = NULL, use.u = F,
                         n.cores = c("all-1", "all"), save.path = NULL)

saveRDS(mm.3.ci, "06_3_mp_model3_ci_v2.rds")
mm.3.ci <- confint(mm.3, method = "Wald") %>% # bootstrap these later
  as.data.frame %>%
  rownames_to_column %>%
  filter(complete.cases(.)) %>%
  rename(LL = `2.5 %`, UL = `97.5 %`) %>%
  mutate(OR_LL = exp(LL), OR_UL = exp(UL))
coef(summary(mm.3)) %>%
  as.data.frame %>%
  rownames_to_column() %>%
  mutate(OR = exp(Estimate)) %>%
  left_join(mm.3.ci, by = "rowname") %>%
  select(rowname, OR, OR_LL, OR_UL, Estimate, LL, UL, everything()) %>%
  kable(digits = 3)
rowname OR OR_LL OR_UL Estimate LL UL Std. Error z value Pr(>|z|)
(Intercept) 6.693 2.920 15.343 1.901 1.072 2.731 0.423 4.492 0.000
delaylong 0.229 0.127 0.412 -1.476 -2.066 -0.886 0.301 -4.905 0.000
delaymedium 0.298 0.166 0.537 -1.210 -1.797 -0.623 0.299 -4.039 0.000
norm_age 1.023 0.824 1.269 0.022 -0.194 0.238 0.110 0.203 0.839
task_experienceyes 0.943 0.701 1.269 -0.058 -0.355 0.238 0.151 -0.385 0.700
cup_distance 2.112 1.699 2.624 0.747 0.530 0.965 0.111 6.743 0.000
trial 1.027 0.968 1.090 0.027 -0.033 0.086 0.030 0.874 0.382
delaylong:norm_age 1.009 0.817 1.246 0.009 -0.202 0.220 0.108 0.083 0.934
delaymedium:norm_age 1.050 0.857 1.288 0.049 -0.155 0.253 0.104 0.474 0.636

Pairwise contrasts for delay

based on estimated marginal means

(emm = emmeans(mm.3, pairwise ~ delay, type = "response")$contrasts)
 contrast       odds.ratio         SE  df z.ratio p.value
 short / long    4.3746081 1.31625922 Inf   4.905  <.0001
 short / medium  3.3515380 1.00366654 Inf   4.039  0.0002
 long / medium   0.7661345 0.06572278 Inf  -3.105  0.0054

Results are averaged over the levels of: task_experience 
P value adjustment: tukey method for comparing a family of 3 estimates 
Tests are performed on the log odds ratio scale 
confint(emm)
 contrast       odds.ratio         SE  df asymp.LCL asymp.UCL
 short / long    4.3746081 1.31625922 Inf  2.161127 8.8551934
 short / medium  3.3515380 1.00366654 Inf  1.661240 6.7617004
 long / medium   0.7661345 0.06572278 Inf  0.626597 0.9367457

Results are averaged over the levels of: task_experience 
Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 3 estimates 
Intervals are back-transformed from the log odds ratio scale 

Model 3 plots

Fixed effects

plot_model(mm.3, title = "Fixed Effects", order.terms = c(6, 4, 3:1, 8:7, 5),
           width = .3, show.values = T, value.size = 2.5, value.offset = .3) +
  geom_hline(yintercept = 1, lty = 2) +
  ylim(.05, 3)

ggsave("../graphs/05_forestplot.png", width = 4, height = 2.5, scale = 2)

Random effects

ranef.plots2 <- plot_model(mm.3, type = "re", sort.est = "(Intercept)")

Subject/Site

ranef.plots2[[1]]

Site

ranef.plots2[[2]]

Species

ranef.plots2[[3]]

Model 4

  • further remove subject/site random effects to look at species differences (from preregistration)
mm.4 <- glmer(correct ~ delay * norm_age +
                task_experience + cup_distance + trial +
                (1 + delay | species)
              , data = model.data
              , family = binomial
              , control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5))
        )

Model comparison

We’re looking for the lowest AIC(c) as the model with the ‘best fit’ with a reasonable number of parameters. (Too many are penalized by AIC as one way to address overfitting.)

Indeed, the reduced model seems to do a better job of striking that balance between fitting the data with fewer parameters.

AICctab(mm.1, mm.2, mm.3, mm.4, logLik = T, weights = T)
     dLogLik dAICc df weight
mm.3  83.7     0.0 27 0.9961
mm.2  86.2    11.1 35 0.0039
mm.1  89.6    71.5 68 <0.001
mm.4   0.0   143.2 15 <0.001
anova(mm.1, mm.2, mm.3, mm.4)
Data: model.data
Models:
mm.4: correct ~ delay * norm_age + task_experience + cup_distance + 
mm.4:     trial + (1 + delay | species)
mm.3: correct ~ delay * norm_age + task_experience + cup_distance + 
mm.3:     trial + (1 + delay | site/subject_site) + (1 + delay | species)
mm.2: correct ~ delay * norm_age + task_experience + cup_distance + 
mm.2:     trial + (1 + trial + delay | site/subject_site) + (1 + delay | 
mm.2:     species)
mm.1: correct ~ delay * norm_age + task_experience + cup_distance + 
mm.1:     board_size + trial + (1 + delay + trial | site/subject_site/block) + 
mm.1:     (1 + task_experience + cup_distance + board_size + trial + 
mm.1:         delay | species)
     Df    AIC    BIC  logLik deviance    Chisq Chi Df Pr(>Chisq)    
mm.4 15 6979.6 7080.7 -3474.8   6949.6                               
mm.3 27 6836.2 7018.2 -3391.1   6782.2 167.3967     12     <2e-16 ***
mm.2 35 6847.1 7083.0 -3388.6   6777.1   5.0907      8     0.7478    
mm.1 68 6906.4 7364.7 -3385.2   6770.4   6.7083     33     1.0000    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Difference in regression coefficients

Difference

coef1 <- coef(summary(mm.1))[c(2, 3, 6), 1]
coef2 <- coef(summary(mm.3))[c(2, 3, 6), 1]
coef2 - coef1
   delaylong  delaymedium cup_distance 
  -0.1359696   -0.1297933    0.0556988 

Difference in odds ratios

exp(coef2) - exp(coef1)
   delaylong  delaymedium cup_distance 
 -0.03329287  -0.04134608   0.11440274 
---
title: "ManyPrimates pilot mixed modeling"
author: "Drew Altschul"
date: "18 Oct 2018"
output:
  html_notebook:
    css: style.css
    theme: paper
    toc: yes
    toc_float: yes
---

```{r setup, message=FALSE}
library(tidyverse)
library(knitr)
library(sjPlot)
library(lme4)
library(emmeans)
library(car) # for vif
library(bbmle) # for AICtab
library(broom) # for glance

theme_set(ggthemes::theme_few())
```

# Summary

Mixed modeling with all relevant variables predicting accuracy

From the preregistration, the mixed model was specified thusly:

```
correct ~ delay * age + 
          task_experience + cup_distance + board_size + trial +
          (1 + delay + trial | site/subject/block/hiding_location ) + 
          (1 + task_experience + cup_distance + board_size + trial + delay | species)
```

In the dataframe, 
`subject_site = subject`,
and `norm_age` should be used for `age`.

Model as pre-registered has too many random effects

```
Error: number of observations (=6246) < number of random effects (=10608) for term (1 + delay + trial | hiding_location:(block:(subject_site:site))); the random-effects parameters are probably unidentifiable
```

Pruning random effects in the following order (from preregistration): 

> - Remove correlations between random effects
> - Remove random slopes (in the following order)
>     - `species`
>     - `hiding_location`
>     - `block`
>     - `subject`

Model only converges once we take out `hiding_location`. After doing so, the other random effects (correlation, site, species) can be put back in.

The model below converges. Model output is saved in `06_mp_model_v2.rds`

```
correct ~ delay * norm_age + 
          task_experience + cup_distance + board_size + trial + 
          (1 + delay + trial | site/subject_site/block) + 
          (1 + task_experience + cup_distance + board_size + trial + delay | species)
```

## Reduced model

After pruning random effects with little variability and removing `board_size`, which covaried with `cup_distance`, the reduced model has the following structure. It is saved in `06_mp_3_model3_v2.rds`

```
correct ~ delay * norm_age + 
          task_experience + cup_distance + trial + 
          (1 + delay | site/subject_site) + 
          (1 + delay | species)
```

![](../graphs/05_forestplot.png)

***

# Data prep

Data import

```{r loading data}
mp_data <- read.csv("../data/merged_data/01_manyprimates_pilot_merged_data_v2.csv")
```

Prepare code for pre-registered mixed modeling

- center `cup_distance`, `board_size` and `trial`
- filter out spider monkey. Only one data point so far, therefore this is not worth including to explode the number of random effects

```{r}
model.data <- mp_data %>%
  filter(species != "black_faced_spider_monkey") %>%
  mutate_at(vars(cup_distance, board_size, trial), funs(scale(.)[, 1])) %>%
  mutate(hiding_location = factor(hiding_location),
         delay = fct_relevel(delay, "short"))
```

# Model 1

The model takes a while to run. Run next line to load model output from previous run with structure below.

```{r}
# mm.1 <- readRDS("06_mp_model.rds")
mm.1 <- readRDS("06_mp_model_v2.rds")
```

```{r, eval=FALSE}
mm.1 <- glmer(correct ~ delay * norm_age +
               task_experience + cup_distance + board_size + trial +
               (1 + delay + trial | site/subject_site/block) +
               (1 + task_experience + cup_distance + board_size + trial + delay | species)
             , data = model.data
             , family = binomial
             , control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5))
             )

saveRDS(mm.1, "06_mp_model_v2.rds")
```

Some diagnostics

- examining Cholesky decomposition

```{r}
theta <- getME(mm.1, "theta")
diag.element <- getME(mm.1, "lower") == 0
any(theta[diag.element] < 1e-5)
```

## Model summary

Confirm model structure

```{r}
# mm.1@call
formula(mm.1)
```

```{r, results="asis"}
glance(mm.1) %>% kable(digits = 2)
```

## Random effects

```{r}
fmt <- function(num, digits) return(round(num, digits))
VarCorr(mm.1) %>% print(formatter = fmt, digits = 3) # comp = c("Variance", "Std.Dev.")
```

## Fixed effects

CIs

```{r}
mm.1.ci <- confint(mm.1, method = "Wald") %>% # bootstrap these later
  as.data.frame %>%
  rownames_to_column %>%
  filter(complete.cases(.)) %>%
  rename(LL = `2.5 %`, UL = `97.5 %`) %>%
  mutate(OR_LL = exp(LL), OR_UL = exp(UL))
```

```{r, results="asis"}
coef(summary(mm.1)) %>%
  as.data.frame %>%
  rownames_to_column() %>%
  mutate(OR = exp(Estimate)) %>%
  left_join(mm.1.ci, by = "rowname") %>%
  select(rowname, OR, OR_LL, OR_UL, Estimate, LL, UL, everything()) %>%
  kable(digits = 3)
```

<!-- ## Correlation of Fixed Effects -->

```{r, eval=FALSE, results="asis"}
corr <- cov2cor(vcov(mm.1)) %>% as.matrix %>% round(2)
corr[upper.tri(corr, diag = T)] <- ""
colnames(corr) <- 1:10
rownames(corr) <- str_c(1:10, " ", rownames(corr))

corr %>% as.data.frame %>% select(-10) %>% rownames_to_column
```

## Pairwise contrasts for delay

based on estimated marginal means

*Note. This wasn't in the preregistration.*

```{r, message=FALSE}
emmeans(mm.1, pairwise ~ delay, type = "response")$contrasts
```

# Model 1 plots

## Fixed effects

```{r, fig.width=4, fig.height=2.5, message=FALSE}
plot_model(mm.1, title = "Fixed Effects", order.terms = c(7, 4, 3:1, 9:8, 5, 6),
           width = .3, show.values = T, value.size = 2.5, value.offset = .3) +
  geom_hline(yintercept = 1, lty = 2) +
  ylim(0, 3)
```

## Random effects

```{r}
ranef.plots <- plot_model(mm.1, type = "re", sort.est = "(Intercept)")
```

### Block/Subject/Site

In line with the model summary above, there's essentially zero variability in the random effects estimates for this.

```{r fig.height=20, fig.width=10}
ranef.plots[[1]]
```

### Subject/Site

```{r, fig.width=10, fig.height=8}
ranef.plots[[2]]
```

### Site

```{r, fig.width=10, fig.height=3}
ranef.plots[[3]]
```

### Species

```{r ranef species, fig.width=8, fig.height=2}
ranef.plots[[4]]
```

***

# Pruning the model

- remove `block` from random effects as the estimates in the previous models were essentially 0
- same for `trial` random slopes within `species`

```
correct ~ delay * norm_age + 
          task_experience + cup_distance + board_size + trial +
          (1 + delay + trial | site/subject_site ) +         
          (1 + task_experience + cup_distance + board_size + delay | species)
```

## Check colinearity in the previous model

```{r}
col.mm1 <- glm(correct ~ delay + norm_age +
                 task_experience + cup_distance + board_size + trial
               , data = model.data
               , family = binomial)

vif(col.mm1)
```

`board_size` and `cup_distance` show high colinearity

Remove `board_size` as it is highly correlated with `cup_distance`. Cup distance seems to be of more immediate relevance.

```
correct ~ delay * norm_age + 
          task_experience + cup_distance + trial +
          (1 + delay + trial | site/subject_site ) +         
          (1 + task_experience + cup_distance + delay | species)
```

## Check levels of random effects

Check how many different levels there are within each random effect

```{r}
source("diagnostic_fcns.r")

overview <- fe.re.tab("correct ~ delay + task_experience + cup_distance + trial", "species", data = model.data)

overview$summary
```

This suggests that, within species, random slopes for `task_experience` does not make much sense as most species have only 1 level. Same is true for `cup_distance`. Indeed, the model summary and random effects plot for `species` confirm that there is little variability in these estimates (they're close to zero). Therefore they are removed.

```
correct ~ delay * norm_age + 
          task_experience + cup_distance + trial +
          (1 + delay + trial | site/subject_site ) +         
          (1 + delay | species)
```

# Model 2

The model takes a while to run. Run next line to load model output from previous run with structure below.

```{r}
mm.2 <- readRDS("06_2_mp_model2_v2.rds")
# mm.2.ci <- readRDS("06_2_mp_model2_ci_v2.rds")
```

```{r, eval=FALSE}
mm.2 <- glmer(correct ~ delay * norm_age +
              task_experience + cup_distance + trial +
              (1 + trial + delay | site/subject_site) +
              (1 + delay | species)
              , data = model.data
              , family = binomial
              , control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5))
              )

saveRDS(mm.2, "06_2_mp_model2_v2.rds")
```

# Model 3

Remove `trial` from the random slopes for subject/site as it's near zero both in `mm.1` and even more so in `mm.2`

```{r}
VarCorr(mm.2) %>% print(comp = c("Variance", "Std.Dev."), formatter = fmt, digits = 3)
```

```{r, fig.width=10, fig.height=8}
plot_model(mm.2, type = "re", sort.est = "(Intercept)")[[1]]
```

```{r, fig.width=10, fig.height=3}
plot_model(mm.2, type = "re", sort.est = "(Intercept)")[[2]]
```

The model takes a while to run. Run next line to load model output from previous run with structure below.

```{r}
mm.3 <- readRDS("06_3_mp_model3_v2.rds")
```

```{r, eval=FALSE}
mm.3 <- glmer(correct ~ delay * norm_age +
              task_experience + cup_distance + trial +
              (1 + delay | site/subject_site) +
              (1 + delay | species)
              , data = model.data
              , family = binomial
              , control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5))
              )

saveRDS(mm.3, "06_3_mp_model3_v2.rds")
```

## Model summary

Confirm model structure

```{r}
formula(mm.3)
```

```{r, results="asis"}
glance(mm.3) %>% kable(digits = 2)
```

## LRT

```{r}
drop1(mm.3, test = 'Chisq')
```


## Random effects

```{r}
VarCorr(mm.3) %>% print(comp = c("Variance", "Std.Dev."), formatter = fmt, digits = 3)
```

## Fixed effects

CIs

<!-- Bootstrap function by Roger Mundry @ MPI EVA -->

```{r, eval=FALSE}
# this is not currently run
source("boot_glmm.r")

mm.3.ci <- boot.glmm.pred(model.res = mm.3, excl.warnings = F, nboots = 1000,
                         para = F, resol = 100, level = 0.95, use = NULL,
                         circ.var.name = NULL, circ.var = NULL, use.u = F,
                         n.cores = c("all-1", "all"), save.path = NULL)

saveRDS(mm.3.ci, "06_3_mp_model3_ci_v2.rds")
```

```{r}
mm.3.ci <- confint(mm.3, method = "Wald") %>% # bootstrap these later
  as.data.frame %>%
  rownames_to_column %>%
  filter(complete.cases(.)) %>%
  rename(LL = `2.5 %`, UL = `97.5 %`) %>%
  mutate(OR_LL = exp(LL), OR_UL = exp(UL))
```

```{r, results="asis"}
coef(summary(mm.3)) %>%
  as.data.frame %>%
  rownames_to_column() %>%
  mutate(OR = exp(Estimate)) %>%
  left_join(mm.3.ci, by = "rowname") %>%
  select(rowname, OR, OR_LL, OR_UL, Estimate, LL, UL, everything()) %>%
  kable(digits = 3)
```

## Pairwise contrasts for delay

based on estimated marginal means

```{r, message=FALSE}
(emm = emmeans(mm.3, pairwise ~ delay, type = "response")$contrasts)
```

```{r, message=FALSE}
confint(emm)
```


# Model 3 plots

## Fixed effects

```{r, fig.width=4, fig.height=2.5, message=FALSE}
plot_model(mm.3, title = "Fixed Effects", order.terms = c(6, 4, 3:1, 8:7, 5),
           width = .3, show.values = T, value.size = 2.5, value.offset = .3) +
  geom_hline(yintercept = 1, lty = 2) +
  ylim(.05, 3)
```

```{r}
ggsave("../graphs/05_forestplot.png", width = 4, height = 2.5, scale = 2)
```

## Random effects

```{r}
ranef.plots2 <- plot_model(mm.3, type = "re", sort.est = "(Intercept)")
```

### Subject/Site

```{r, fig.width=10, fig.height=8}
ranef.plots2[[1]]
```

### Site

```{r, fig.width=10, fig.height=3}
ranef.plots2[[2]]
```

### Species

```{r, fig.width=4, fig.height=2}
ranef.plots2[[3]]
```

# Model 4

- further remove subject/site random effects to look at species differences (from preregistration)

```{r}
mm.4 <- glmer(correct ~ delay * norm_age +
                task_experience + cup_distance + trial +
                (1 + delay | species)
              , data = model.data
              , family = binomial
              , control = glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e5))
        )
```

# Model comparison

We're looking for the lowest AIC(c) as the model with the 'best fit' with a reasonable number of parameters. (Too many are penalized by AIC as one way to address overfitting.)

Indeed, the reduced model seems to do a better job of striking that balance between fitting the data with fewer parameters.

```{r}
AICctab(mm.1, mm.2, mm.3, mm.4, logLik = T, weights = T)
```

```{r}
anova(mm.1, mm.2, mm.3, mm.4)
```

## Difference in regression coefficients

Difference

```{r}
coef1 <- coef(summary(mm.1))[c(2, 3, 6), 1]
coef2 <- coef(summary(mm.3))[c(2, 3, 6), 1]

coef2 - coef1
```

Difference in odds ratios

```{r}
exp(coef2) - exp(coef1)
```

